Network analysis of online bidding activity 
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With the advent of digital media, people are increasingly resorting to online channels for com- 
mercial transactions. Online auction is a prototypical example. In such online transactions, the 
pattern of bidding activity is more complex than traditional ofiline transactions; this is because 
the number of bidders participating in a given transaction is not bounded and the bidders can also 
easily respond to the bidding instantaneously. By using the recently developed network theory, we 
study the interaction patterns between bidders (items) who (that) are connected when they bid 
for the same item (if the item is bid by the same bidder). The resulting network is analyzed by 
using the hierarchical clustering algorithm, which is used for clustering analysis for expression data 
from DNA microarrays. A dendrogram is constructed for the item subcategories; this dendrogram 
is compared with a traditional classification scheme. The implication of the difference between the 
two is discussed. 



PACS numbers: 



I.75.HC, 



i.65.Gh, 89.75.-k 



I. INTRODUCTION 



Electronic commerce (e-commerce) refers to any type 
of business of commercial transaction that involves in- 
formation transfer across the Internet. Online auc- 
tion, a synergetic combination of the Internet supported 
by instantaneous interactions and traditional auction 
mechanisms, has rapidly expanded over the last decade. 
Owing to this rapid expansion and the importance of 
online auctions, very recently researchers have begun 
to pay attention to the various aspects of online auc- 
tions ill, i^ i^ 0) IS S Q- According to recent studies 
based on empirical data obtained from eBay.com, it was 
discovered that the online auction system is driven by 
a self-organized process, involving almost all the agents 
that participate in a given auction activity. For exam- 
ple, the total number of bids placed for a single item or 
category and the bid frequency submitted by each agent 
follow power-law distributions J5(. Further, the bidding 
process occurring in online auctions has been successfully 
described through the stochastic rate equation |g. Thus, 
understanding of the bidding activities in online auctions 
is a highly attractive topic for the statistical physics com- 
munity. 

The remarkable connection between beer and diapers 
discovered in 1992 by Blischok et al. Q has significantly 
improved profits. They analyzed the correlation between 
items sold at a drug store during a particular time in- 
terval between 5 p.m. and 7 p.m.. They found a strong 
correlation between the two items, which had never been 
noticed by the retailer earlier. This correlation arises 
from the fact that fathers in families tend to buy beer 
when they are told by their wives to buy diapers while 
returning home. This discovery, which is considered as 
a pioneering work of data mining, compelled drug stores 
to redesign their displays; this resulted in an increase in 
beer sales. 



In online auctions, most of the limitations hamper- 
ing traditional offline auctions, such as spatial and tem- 
poral constraint have virtually disappeared. Thus, it 
would be interesting to investigate how the bidding pat- 
tern of online auctions has changed from the tradi- 
tional one. On the other hand, recently, considerable 
attention have been focused on complex network prob- 
lems as an interdisciplinary subject (allflliil- Diverse 
computational methods to find clusters within large- 
scale networks have been introduced (for example, see 
Refs. [l2j J_3 , J^, J_5j J^] ) . Thus, by combining these two 
issues, in this study, we investigate the pattern emerg- 
ing from the interactions between individual bidders or 
items in online auctions by using the recently developed 
network theory. The resulting network provides informa- 
tion on the bidding pattern of individual bidders as well 
as the correlation between different item subcategories. 
Moreover, we construct a dendrogram for these subcat- 
egories and compare it with a traditional classification 
scheme based on off-line transactions. For the purpose, 
we use an algorithm applied for clustering analysis for 
the expression data from a DNA microarray experiment 
in biological systems 17] . The dendrogram thus obtained 
is consumer-oriented, reflecting the pattern of an individ- 
ual bidder's activities. Thus, it can be used for increasing 
profits by providing consumers with a link between the 
items, which should interest the consumers. 



Our study is based o n empirical data ^j collected from 
http://www.eBay.com The dataset comprises all the 
auctions that ended in a single day, July 5, 2001, and in- 
cludes 264,073 auctioned items grouped into 18 categories 
and 192 subcategories. The number of distinct agents 
that participated in these merchandize was 384,058. 




FIG. 1: A schematic illustration of a bipartite network of 
an online auction. Bidders and items are represented by el- 
lipses with {A,B,...F} and squares with {a,b,c,d}, respec- 
tively. Bidders A and B are connected via item a which they 
bid for. Items a and b are connected via bidders C and E who 
bid for both items a and b. 
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Bidder network 


338,478 


1,208,236 


22,883 


Item network 


122,827 


813,687 


3,851 



TABLE I: The numbers of vertices A'', edges L, and isolated 
clusters Ciso for the bidder and the item networks. 



II. TOPOLOGIES OF BIDDER AND ITEM 
NETWORKS 

The data contain the information on which bidder bids 
for which item via their unique user ID. Thus, we can con- 
struct a bipartite network comprising two disjoint sets of 
vertices, bidders and items, as shown in Fig. ^ The 
bipartite network can be converted to a single species of 
network such as the bidder or the item network, as shown 
in Figs. 121 a) and[2Ib), respectively. The bidder and item 
networks can have edges with weight. For example, bid- 
ders C and D in Fig.^are connected twice through items 
a and c. Hence, the edge between C and D has weight 2. 
Similarly, items a and b are connected twice through bid- 
ders C and E. Thus, the edge between vertices a and b in 
the item network has weight 2. Statistics describing the 
topology of the entire network and the giant component 
of the bidder and the item network are Usted in Table ^ 
and Table Ull respectively. 

Next, we characterize the structure of the bidder and 
item networks. First, we regard each network as a binary 
network, neglecting the weight of each edge. The network 
configuration can be described by the adjacent matrix 
{aij\'^ its component is 1 when two vertices i and j are 
connected and otherwise. Then, degree ki of vertex i 



is ki = Y] 



N 



-tij, 



which is the number of edges connected 



to it. We find that the degree distribution exhibits a 
power-law behavior asymptotically for both the bidder 
and item networks, Pd{k) ^ k~'^ . The degree exponent 



FIG. 2: Bidder network (a) and item network (b) converted 
from the bipartite network shown in Fig. 1. Thick edges have 
weight 2, while the other edges have unit weight for both (a) 
and (b). 





N 


L 


(fc) 


(d) 


Bidder network 


267,414 

(79%) 


2,245,794 
(93%) 


8.4 


8.15 


Item network 


112,240 

(91%) 


695,281 

(85%) 


12.4 


7.69 



TABLE II: Statistics of the giant component of the bidder 
and item networks. The number of vertices is denoted by A''; 
edges, L\ mean degree, (fc); and mean distance between two 
vertices, (d). 



7 is estimated to be 7b ~ 3.0 for the bidder network and 
7/ K, 2.0 for the item network, as shown in Fig. 13 

Second, strength Si of vertex i is the sum of the weights 
of each edge connected to it. That is, Si = ^ UijWij, 
where Wij is the weight of the edge between vertices i and 
j. The strength distributions of the bidder and item net- 
works also exhibit power-law behaviors asymptotically as 
Ps{s) ^ {s + so)^'' where tjb ~ 4.0 for the bidder net- 
work and 77/ « 3.5 for the item network, as shown in 
Fig. 01 So is constant. Strength and degree of a given 
vertex exhibit an almost linear relationship s(fc) ~ fc^ 
with C « 0.95; however, large fluctuations are observed 
for large k in Fig. [S] 

Third, we measure the mean nearest-neighbor degree 
function (fcnn)(A;). The mean degree of the nearest- 
neighbor vertices of a given vertex i with degree k is 
measured as follows: 



— {/_^ o-ijkjj/ki. 



(1) 



The average of fci^nn over the centered vertex with degree 
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FIG. 3: Degree distribution Pdik) as a function of degree k. 
Both display power-law behaviors Pd{k) ~ k~~' with 73 ~ 3.0 
for the bidder network and 7/ ~ 2.0 for the item network. 
Solid lines are guidelines with slopes of 3.0 and 2.0 for the 
bidder and item networks, respectively. 
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FIG. 4: Strength distributions Psis) as a function of strength 
s for the bidder and item networks. Asymptotically, they 
display a generalized power-law behavior Ps{s) ~ (s -|- so)~^ ■ 
The exponent is estimated to be rjB ~ 4.0 for the bidder 
network and rji ~ 3.5 for the item network, so — 51 is used 
for the bidder network and so = 52 for the item network. 
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FIG. 5: The relation between strength s and degree k of each 
vertex. They show an almost linear relationship, s ^ k'' with 
C « 0.95 for both the bidder (o) and the item (D) network. 
Inset: Replot of s vs. k using the log-bin data. 




FIG. 6: The mean nearest-neighbor degree function (fcnn)(fc) 

(o) and its weighted version (fein )(fc) (□) as a function of the 
degree fc of a centered vertex for the bidder network (a) and 
the item network (b). Solid line, obtained from a least-square- 
fit, has a slope of 0.44 for the bidder network (a) and 0.77 for 
the item network (b). Both the networks are assortatively 
mixed. 



k is taken to obtain (fcnn)(^)- For the weighted network, 
formula (^ is replaced following the formula |18|: 



d^) 



■ y ^ajjWijkj. 



(2) 



From this equation, (fcnn )(^) can be similarly obtained. 

It is found that the functions (fcnn)(fc) and (fcnn )(fc) in- 
crease with the degree k of the centered vertex for both 
the bidder and item networks irrespective of the binary 
or weighted versions. That is, both the networks are as- 
sortatively mixed, implying that active bidders tend to 
simultaneously bid for common items, thereby attractive 
items are also connected via such active bidders. 

Fourth, the local clustering coefficient Ci is the density 
of transitive relationships, and is defined as the number 
of triangles formed by its neighbors, which are cornered 
at vertex i, divided by the maximum possible number of 
neighbors, ki{ki — l)/2. That is. 



ki\ki Lj 
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(3) 



The average of Ci over the vertices with degree k is called 
the clustering coefficient function c(fc). For weighted net- 



works, a similar clustering coefficient c, 
as 



(w) 



is defined lli 
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(4) 
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The average of c\ over the cornered vertices with degree 
k is similarly defined and denoted as c''"'\k). For the bid- 
der network, the clustering coefficient functions c{k) and 
c''^\k) decrease with respect to k as shown in Fig|7fa); 
they exhibit large fluctuations for large fc, implying that 
the bidder network is hierarchically organized. For the 
item network, however, both c(fc) and c^""' (fc) are almost 
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FIG. 7: Average clustering coefficient functions c{k) (o) and 
c^'"'(fe) (D) as a function of degree k for the bidder network 
(a) and the item network (b). The result implies that the 
bidder network is hierarchically organized, whereas the item 
network is almost random. 



independent of k, which is shown in Fig. [Tib); this implies 
that the network is almost randomly organized. Such be- 
haviors are observed irrespective of whether the networks 
are binary or weighted. 



III. CLUSTER IDENTIFICATION 

By using network analysis, individual elements can be 
classified into clusters. Here, we apply the hierarchical 
agglomeration (HA) algorithm, which was introduced by 
Clauset et al. [13 , to the item network containing 264,073 
items. In particular, the algorithm is useful for a system 
containing a large number of elements. Clusters iden- 
tified using this analysis are compared with traditional 
subcategories established based on off-line transactions. 
The obtained difference can be used for reorganizing a 
dendrogram with regard to item subcategories; this dif- 
ference reflects the pattern of online bidding activities. 

To realize this, we first store the topology of the item 
network by using the adjacent matrix {ay}. By main- 
taining this information, we delete all the edges, thereby 
leaving N isolated vertices. At each step, we select one 
edge from the stored adjacent matrix, which maximizes 
a change in the modularity, defined as 



Q = E' 



(5) 



where Caa is the fraction of the edges that connect the 
vertices within cluster a on both the ends of each edge, 
and aa is the fraction of edges attached on one end or 
both the ends to vertices within cluster a. The selected 
edge is eliminated from the stored matrix. We continue 
this edge-adding process until the modularity becomes 
maximum. We find that the modularity reaches the value 
Qmax ~ 0.79 for the item network and Qmax ~ 0.83 for 
the bidder network; this implies that both the networks 
are extremely well categorized. We recognize i,904 and 
870 distinct clusters in the bidder and item networks, re- 
spectively. The cluster sizes, the number of vertices of 
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Number of added edges 

FIG. 8: The evolution of modularity Q by using the edge- 
adding process. The x axis represents the number of edges 
added. The maximum value obtained is estimated to be 
Qmax = 0.83 for the bidder network (solid line) and Qmax = 
0.79 for the item network (dotted line). 
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FIG. 9: The cluster-size distributions for the bidder and item 
networks, identified using the HA algorithm. The distribu- 
tions follow the power law, Pm{M) ~ M~^ with tb ~ 2.2 
and Ti ~ 2.1. The exponents are estimated from the region 
with the data in small M. Solid and dashed lines are guide- 
lines. The presented data are log-binned. Raw data in the 
region with large A'l are sparse. 



each module, are not uniform. The cluster-size distri- 
butions for both networks, even though large deviations 
exist for a large cluster size M, exhibit fat-tail behaviors 
such that PmiM) ~ M"^ with tb « 2.2 and r/ « 2.1 
roughly. The exponents are estimated from the data in 
the region with small AI . 



IV. DENDROGRAM BASED ON ONLINE 
TRANSACTIONS 

A. Closeness 

In this section, we focus on the item network. We have 
identified 870 distinct clusters by using the clustering al- 
gorithm. Among them, 49 clusters contain more than 100 
items within each cluster. On the other hand, according 
to the traditional classification scheme, items in the eBay 



auction are categorized into 18 categories which contains 
192 subcategories. Obviously the clusters that we found 
are not equivalent to these categories or subcategories. 
Thus, our goal is to construct a new dendrogram, a hi- 
erarchical tree, among 192 subcategories based on the 
closeness between the obtained clusters and the existing 
subcategories. 

To illustrate closeness, we select a cluster a and clas- 
sify the items within the cluster into 192 subcategories. 
The fraction of items in each subcategory [i is the close- 
ness Cq^. For example, Fig. llOlshows the closenesses for 
the first five largest clusters. Each strip represents a clus- 
ter obtained from the HA algorithm. For each strip, the 
X-axis represents 192 subcategories, and the y-axis does 
the closeness. The bar indicates the closeness. For clus- 
ter 1, subcategory (a) exhibits the largest closeness. For 
cluster 2. subcategory (c) has the largest closeness, and 
so on. The abbreviations for the 18 main categories are 
as follows: Antique stands for antiques and art; Biz^ busi- 
ness and office; Clothes, clothing and accessories; Collect, 
collectibles; Comp, computers; Elec, consumer electron- 
ics; Dolls, dolls and bears; Home, home and garden's; 
Jewelry, jewelry, gemstones and watches; Glass, pottery 
and glass; and Estt, real estate. 



B. Correlation matrix 

To quantify the correlation, we adopt the method used 
for the clustering analysis for expression data from DNA 
microarrays. In this approach, we regard the closeness as 
the expression level, subcategories as genes, and clusters 
as different DNA microarray experiments |l7l| . 

The correlation matrix element pap is defined as 



Pa0 



(CafiCpfi) — {Ca^){Cpf, 



7((C^,)-(C„,)2)((C|^)-(C^^)2)' 



(6) 



where Cq^ represents the closeness of subcategory ji (ji = 
l,...n = 192) to cluster a [a = 1,...870) and (• • • ) 
denotes the average over different clusters indexed by ^. 
Based on the correlation matrix, a dendrogram that 
assembles all n = 194 subcategories into a single tree 
can be constructed. For this purpose, we use the 
hierarchical clustering algorithm introduced by Eisen 
et al. [13 ■ We start the tree construction process by 
calculating the correlation coefficients {pap} with size 
192 X 870. Next, the matrix is scanned to identify a pair 
of subcategories with the highest value of the correlation 
coefficient, and the two subcategories are combined. 
Thus, a pseudo-subcategory is created, and its closeness 
profile is calculated only by averaging closenesses of 
the combined subcategories. This is referred to as the 
average-linkage clustering method. Then, n — 2 isolated 
subcategories and a pseudo-subcategory remain. The 
correlation matrix is updated with these n — 1 units and 
the highest correlation coefficient is found. The process 
is repeated n — 1 times until only a single element 



remains. After these steps, a dendrogram is constructed 
in which the height represents the magnitude of the 
correlation coefficient. 



C. Rearrangement of subcategories in the 
dendrogram 

The resulting dendrogram is shown in the upper part 
of Fig. 111! which is considerably different from the tra- 
ditional classification scheme shown in the lower part of 
this figure. We discuss the details of the correlations of 
the subcategories in the dendrogram. For discussion, we 
divide the entire tree structure into six branches, denoted 
by (A)-(F). 

To be specific, branch (A) covers a broad range of dif- 
ferent collectibles. The relationship between the subcat- 
egories may be attributed to collecting manias. Branch 
(B) mainly covers three types of subcategories: cloth- 
ing and accessories, business, office and industries, and 
sports categories. Branch (C) consists of three parts: the 
first part has antiquary property and the items used for 
decorating homes and the second part covers very broad 
kinds of items. The third is interesting and covers a set 
of electronic products such as computers, cameras, audio 
players, etc. It also includes video games as well as food 
and beverages. At a glance, one may wonder how these 
two items are correlated; however, by considering the fact 
that some video games maniacs requires foods and bev- 
erages while playing, one can find the reason. Thus, the 
dendrogram indeed reflects the bidding patterns of indi- 
vidual bidders. Branch (D) covers items related to artis- 
tic collections and hobbies. Branch (E) covers books, dolls 
for children, etc. Finally, branch (F) mainly covers col- 
lectibles in a wide range from jewelry to stamps. 



V. CONCLUSIONS AND DISCUSSION 

Based on the empirical data collected from the eBay 
web site, we have constructed a bipartite network 
comprising bidders and items. The bipartite network is 
converted into two single species of networks, the bidder 
and item networks. We measured various topological 
properties of each network. Both networks are scale free 
in the degree distribution. It is noteworthy that both 
the networks are assortatively mixed with regard to the 
degree correlation. This fact implies that the active 
bidders tend to simultaneously bid for common items; 
therefore, they are connected. Accordingly, attractive 
items are connected via such active bidders. Next, by 
applying the hierarchical agglomeration algorithm, we 
identified clusters in the bidder and item networks. The 
clusters are well separated from each other. Then, we 
calculate the correlation matrix between subcategories 
by using the information on the fraction of items in each 
subcategory in a given cluster. By using this correlation 
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FIG. 10: The closeness between the clusters and subcategories. The x-axis represents the 192 subcategories and the y-axis 
represents the closeness. For the largest cluster (cluster 1), subcategory (a), Sports and Goods, exhibits the largest closeness. 
This result indicates that the main fraction of items in cluster 1 originates from subcategory (a), even though small fractions 
of items exist from other subcategories. For cluster 2, the subcategories of Clothing & Accessories (b), Women Clothing (c), 
and Movies (d) are the major fractions. The fact that these three subcategories belong to the same cluster implies that they 
are strongly correlated in an online transaction. For cluster 3, the subcategory of Sports Trading Cards (e) is dominant. For 
cluster 4, (f),(g),(h), and (i) subcategories exhibit a strongly correlation. For cluster 5, subcategories (j),(k),(l), and (m) are 
correlated, which represent the Pop Culture of Collectibles, Computers, Consumer Electronics, and Movies, respectively. 



matrix, we construct the dendrogram, which is different 
from the traditional classification scheme. Based on a 
detailed investigation about the items closely located in 
the dendrogram, we find that the dendrogram indeed 
is bidder-oriented in an online auction. Therefore, the 
dendrogram could be useful for marketing renovation, 



resulting in an increase in profits. 
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FIG. 11: Upper part: the dendrogram constructed by using the hierarchical clustering algorithm for the item network of an eBay 
onhne auction. Subcategories in branches (A)-(F) are explained in the text. Middle part: the closenesses of each subcategory 
for different clusters are shown with various concentrations. Lower part: the traditional classification scheme of subcategories 
in the version where original data were collected. The classification scheme is a bilayer structure comprising 18 categories and 
192 subcategories. For visual clarity, however, the bilayer structure is shown in a multilayer manner. For comparison, the 
subcategories are in the same order as that used in the upper part. We can easily observe that the traditional classification 
scheme is entangled from the bidder-oriented perspective. 
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